Search annotation and personalization services

ABSTRACT

Various features are disclosed for storing and providing access to event data reflective of user-generated events, including events associated with search query submission of users. One such feature enables users to annotate search results, to later recall and view these annotations, and to publish the annotations to other users. Another feature involves recording event data reflective of search result viewing events of users, and using this event data to personalize search results pages for particular users.

PRIORITY CLAIM

This application is a continuation of U.S. application Ser. No.11/325,009, filed Jan. 4, 2006, which is a division of U.S. patentapplication Ser. No. 10/612,395, filed Jul. 2, 2003, the disclosure ofwhich is hereby incorporated by reference.

BACKGROUND

1. Field

The present disclosure relates to computer-based systems for capturing,persistently storing, and serving event data reflective ofuser-generated events, including search-related events.

2. Description of the Related Art

Web site systems commonly include one or more mechanisms for capturingand storing information about the browsing activities or “clickstreams”of users. The captured clickstream data is commonly used to personalizeweb pages for recognized users. Typically, however, the capturedclickstream data either provides only very limited information abouteach user's browsing history, or is captured in a format that is of onlylimited use for personalization.

For example, some web sites maintain a real time record of each itemselection, browse node selection, and search query submission performedby each user during browsing of an electronic catalog. Such browsehistories are useful, for example, for generating personalized itemrecommendations, and for displaying navigation histories to assist usersin returning to previously accessed content. However, these types ofrecords typically lack the level of detail and structure desired forflexibly building new types of real-time personalization applications.

Some systems also maintain web server access logs (“web logs”) thatcontain a chronological record of every HTTP request received by the website, together with associated timestamp and user ID information. Forweb pages that are generated dynamically, the web query logs may alsorecord the identities of items presented to users within such pages(commonly referred to as item “impressions”). While these logs typicallycontain more detailed browse history information, they are maintained ina format that is poorly suited for the real-time extraction and analysisof users' clickstream histories. Although web logs can be mined forinformation useful to various personalization functions, the task ofmining a large web log can take many hours or days, potentiallyrendering the extracted data stale by the time it is available for use.Further, much of the detailed information contained in a web log isdisregarded during the mining process, and is thus effectively lost forpurposes of personalization.

SUMMARY

An event history server system is disclosed that persistently storesevent data descriptive of events that occur during browsing sessions ofweb site users. The event data is stored in association with the IDs ofthe corresponding users, and is made available in real time to web siteapplications that may use the event data to personalize web pages forspecific users. In one embodiment, the event history server recordsevent data descriptive of substantially every selection event (e.g.,mouse click) of every user of a web site. The event history server mayalso record event data descriptive of other types of browsing events,such as impressions (i.e., items presented to users on dynamicallygenerated web pages) and mouse-over events.

The event data stored for each recorded event is preferably storedwithin a database as an event object. Each event object may, forexample, include identifiers of the general event type (e.g., mouseclick, impression, etc.) and type of display element involved (e.g.,catalog item, browse node, search result item, etc.), an event value(e.g., the text of a selected URL), a timestamp indicating of theevent's date and time of occurrence, and associated context information.A query interface of the event history server enables applications, suchas personalization applications, to retrieve a particular user's eventdata by general event type, type of display element involved, time ofoccurrence, and possibly other criteria. The query interface alsopreferably supports queries with the semantics of “has user X accessedURL Y before?” and “when did user X access URL Y?” The query interfacemay more generally support queries with the semantics of “does an eventof type T and value V exist within the event history of user X?”

In one embodiment, users are provided an option to annotate particularsearch results, and to publish the annotations to other users. Theannotations may be stored as events that can be retrieved via the queryinterface. The system may also provide functionality for users toorganize and search their respective event histories.

Neither this summary nor the following detailed description purports todefine the invention. The invention is defined by the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a web site system that includes an event historyserver system according to one embodiment of the invention.

FIG. 2 illustrates a set of software components that may be executed bythe cache layer servers and storage layer servers of FIG. 1.

FIG. 3 illustrates an example web search results page in which searchresults are annotated to indicate the user's prior browsing history withrespect to such items.

FIG. 4 illustrates a process for identifying search result items thathave previously been accessed by the user.

FIG. 5 illustrates a process for determining whether the user previouslysubmitted the same search query, and if so, whether any new searchresult items have been located in the current search.

FIG. 6 illustrates the use of a Bloom filter to efficiently determinewhether a user has accessed a particular object.

FIG. 7 illustrates how the event history server may be used to collectevent data reported by a browser component, such as a browser toolbar.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

A preferred embodiment of a web site system that includes an eventhistory server system will now be described with reference to thedrawings. As will be recognized, many of the inventive features embodiedwithin the disclosed system may be implemented or used without others.Numerous implementation details will be set forth in the followingdescription in order to illustrate, but not limit, the invention.

I. Overview

FIG. 1 illustrates a web site system 30 that includes an event historyserver system 32 (“event history server”) according to one embodiment ofthe invention. The event history server 32 provides a service forpersistently recording, and providing real-time access to, event dataindicative of events that occur during browsing sessions of users. Theevent data preferably describes the clickstreams of the users, and mayalso describe other types of events such as mouse-over and impressionevents. As described below, the event history server 32 also provides anAPI (application program interface) through which applications andservices can flexibly retrieve, and request information about, the eventdata of specific users.

As illustrated in FIG. 1, the web site system 30 includes one or moreweb server machines 34 that receive and process requests from usercomputers 36 and/or other types of devices (PDAs, mobile phones, etc.).Processes running on the web server machines 34 communicate with variousweb applications 38, which may be implemented as web services. Theseapplications 38 may run on the web server machines 34, but moretypically run on separate server machines (not shown). The types of webapplications provided within a given system will typically depend uponthe nature and purpose of the system. For example, an Internet searchengine site may include one or more search applications 38 for enablingusers to conduct keyword searches of an index of web pages. Anelectronic commerce site may include web applications for performingsuch tasks as conducting searches of an electronic catalog, browsing anelectronic catalog via a browse tree, generating personalizedrecommendations, placing orders, and managing personal account data. Forpurposes of this description, it may be assumed that at least some ofthe web applications 38 supply personalized web page content based onuser-specific event data stored by the event history server 32.

As depicted in FIG. 1, each web server machine 34 preferably runs anevent reporting component 35 that sends updates to the event historyserver 32 in response to actions performed by online users. (A givenuser need not be an individual, but may, for example, be a group ofindividuals that share a user computer 36 or a user account). Each suchupdate contains or otherwise specifies event data descriptive of aparticular event that has occurred during a browsing session of a user,such as a mouse click, a search query submission, or an impression. Theevent reporting component 35 may include a set of APIs (applicationprogram interfaces) for making calls to the event history server 32, andmay include configuration parameters specifying the types of actions forwhich events/updates are to be generated. APIs are also provided forallowing applications 38 to submit queries to the event history server32. As described in section V below, browsing events may additionally oralternatively be reported to the event history server 32 by abrowser-based event reporting component (FIG. 7) that runs on the usercomputers 36.

The set of data stored by the event history server 32 for a particularevent is preferably stored as an “event object.” In one embodiment, forexample, the event history server 32 persistently stores event objectsdescribing substantially every selection action or “mouse click” ofevery recognized user of the web site system 30. This information mayinclude, for example, the URL accessed, the time/date of the access, andassociated context information. The event history server may also recordimpression events reflective of specific items presented to users withindynamically generated web pages. For example, when a user views adynamically-generated web page that includes a personalized list ofitems selected from a database, the event history server may store anevent object for each such item, or may store a single event object thatcontains the list of items. Additional examples of the types of eventsthat may be recorded, and of the event data that may be recorded forsuch events, are provided below. The event objects are preferably storedand indexed within the event history server 32 to permit retrieval basedon time-of-occurrence, general event type (e.g., mouse click versusimpression), type of display element involved (e.g., catalog item,browse node, or Web search result URL), user ID, and other objectproperties.

The event data captured by the event history server 32 reflects actionsperformed by users during browsing of a particular web site or set ofweb sites hosted by the web site system 30. This captured data may, insome embodiments, also reflect actions performed by users duringbrowsing of external, independent web sites. For example, users may bepermitted or required to download to their computers 36 a browserplug-in, such as a browser toolbar, that reports all URL accesses (andpossibly other types of events) to the event history server 32 (see FIG.7, discussed below).

Some or all of the web applications 38 preferably act as clients of theevent history server 32. As depicted in FIG. 1, the web applicationsinteract with the event history server 32 primarily by sending eventqueries to the event history server 32 in order to retrieve or obtaininformation about specific events. Theses queries preferably includerequests for fixed-size blocks of event data, such as “recall last Nevents of type T for user Y,” or “recall next N events of type T foruser Y.” For example, a web application 38 may request the event objectsfor the last fifty search queries submitted by a particular user, or mayrequest the event objects for the last fifty browse nodes viewed by aparticular user. The retrieved event objects may be used by the webapplications 38 for any suitable purpose, such as generatingpersonalized item recommendations or displaying browse history data tousers to facilitate navigation.

The event history server 32 also preferably supports queries of thesemantic form “does an event of type T and value V exist within theevent history of user Y?” and “when did an event of type T and value Voccur within the event history of user Y?” For example, a webapplication can query the event history server 32 to find out whether aparticular user has accessed a particular URL before, and if so, when.As described below, one application of this feature involves generatinga personalized web search results page (see FIG. 3) that highlights thesearch results/URLs that have already been accessed by the particularuser, and indicates when the viewed URLs were last accessed. Additionalexamples of the types of queries that may be supported by the eventhistory server 32, and of applications which make use of the capturedevent data, are provided below.

Although the event queries are depicted in FIG. 1 as emanating from theweb applications 38, the event history server 32 may also respond toevent queries (and possibly updates) from other types of components. Forexample, a data mining application may retrieve and analyze event dataof users for purposes of enhancing browsing of the web site; and asecurity application or administrator may retrieve clickstream data toanalyze a security breach. Application components and services that sendupdates and/or queries to the event history server 32 are referred toherein as “clients,” and are depicted in FIG. 1 as being part of aclient layer 39.

As illustrated in FIG. 1, the event history server 32 consists of twolayers: a cache layer 40 and a persistent storage layer 44. The cachelayer 40 is made up of a set of physical cache layer servers 42, each ofwhich runs service-level software 70 (FIG. 2) for responding to updatesand queries from clients. Each cache layer server 42 includes a cache 43that temporarily stores event data in the form of event objects. Thecache 43 may be implemented using non-volatile disk storage, randomaccess memory, or a combination thereof. As described below, the cache43 may also store user-specific Bloom filters for responding to certaintypes of queries. Communications between the client layer 39, the cachelayer 40 and the persistent storage layer preferably occur over a localarea network.

The cache layer servers 42 are preferably partitioned by browsingsession ID, meaning that each such server 42 only stores event data(event objects) associated with its respective range or group of sessionIDs. Thus, for example, when a user starts a new browsing session, thatsession is assigned to a particular cache layer server 42 whichthereafter services all updates and event queries corresponding to thatbrowsing session. In one implementation, a total of nine dual-CPU cachelayer servers 42 are provided within the cache layer 40, one of which isused as a hot spare. The caches 43 are preferably implemented asstateless, write-through caches, facilitating the addition, removal,backup, and rebooting of the machines 42.

As described below, the cache layer 40 acts essentially as anintermediary between the client layer 39 and the persistent storagelayer 44, allowing the system to operate at higher throughput levels.Although the use of a cache layer 40 is preferred, the event historyserver 32 may alternatively be implemented without a cache layer.Further, where the cache layer 40 is provided, it may be bypassed wherecaching would not be beneficial, such that some requests are serviceddirectly by the persistent storage layer 44.

With further reference to FIG. 1, the persistent storage layer 44 ismade up of a set of physical storage layer servers 46. In theillustrated embodiment, two storage layer servers 46 are provided,although a greater or lesser number may be used. Each storage layerserver 46 is responsible of maintaining a complete copy of the eventdata persistently stored by the system 32. Thus, the storage layerservers 46 are mirrors of each other when two or more are provided.Because the storage layer servers 46 are mirrored, a single storagelayer server 46 can go down or be taken off line without affecting theoverall functionality of the event history service.

Each storage layer server 46 may, for example, include several terabytesof disk drive storage. If approximately 100M events are recorded perday, and an average of thirty bytes of data are stored for each recordedevent, each storage layer server 46 will store approximately 3 GB(gigabytes) of event data per day, or about one terabyte per year. Undersuch a usage scenario, each storage layer server 46 is capable ofstoring and serving several years worth of event data. When the storagecapacity of a storage layer server 46 is reached, new disk drives may beadded to the storage layer to increase its capacity, and/or old eventdata may be purged or archived.

As described below and depicted in FIG. 2, each storage layer server 46runs database software 62 for storing and managing the event data, andruns service-level code 60 for processing updates and queries receivedfrom the cache layer 40. The storage layer servers 46 may be implementedusing low cost commodity hardware, and may use Berkeley databases, oranother type of relatively low cost database, to store the event data.

The event history server 32 responds to updates generally as follows.When an update is sent to the cache layer 40 by a web server machine 34,the cache layer server 42 assigned to the corresponding session IDupdates its respective cache 43 to include the event object specified bythe update. If the event is for a recognized user, the cache layerserver 42 also sends the update to each of the storage layer servers 46(preferably using a publish-subscribe protocol), and each such server 46stores the associated event object within its persistent storage 64 inassociation with the corresponding user ID. Each cache layer server 42preferably aggregates multiple events/updates for sending to the storagelayer servers 46, so that these servers 46 receive updates from thecache layer in batches. The event objects are stored in the cache 43 inassociation with the corresponding session ID and/or user ID.

If the event data specified within an update is for an unrecognizeduser, the cache layer server 42 stores the event object in its cache 43(in association with the corresponding session ID), but does not sendthe update to the persistent storage layer 44. The user may beunrecognized if, for example, the user (1) is new to the web site, (2)is accessing the web site from a particular computer 36 for the firsttime and has not logged in, or (3) is accessing the web site from acomputer 36 that is configured to block cookies and has not logged in.Throughout a session, the cache layer will thus collect event data foran unrecognized user.

If an unrecognized user logs in, creates an account, or otherwisebecomes recognized during the browsing session, the associated cachelayer server 42 associates collected event data for that user with theuser's ID. In addition, the cache layer server 42 sends a series ofupdates to the storage layer servers 44 to persistently store thiscollected event data in association with the user ID. The system thusallows a user's events to be persistently captured even though the usermay not be recognized at the time such events occur. Further, during theperiod in which the user is unrecognized, clients of the event historyserver 32 can retrieve and request information about the cached eventdata of the unrecognized user based on the user's session ID.

The event history server 32 responds to event queries from clientsgenerally as follows. When a cache layer server 46 receives an eventquery from a client, it initially checks its respective cache 43 todetermine whether the relevant event data resides therein, and respondsto the query if the data is present. If the relevant event data does notreside in the cache 43, the query is passed to one of the storage layerservers 46. A load balancing algorithm may be used to select between thestorage layer servers 46 for this purpose. The selected storage layerserver 46 responds to the query by generating a response (which mayinclude requested event objects), and returning this response to thecache layer server 42 from which the query was received. The cache layerserver 42 then passes this response to the requesting client.

If the response includes event data retrieved from persistent storage,the cache layer server 42 stores this event data in its respective cache43 by default. In the preferred embodiment, an event query may indicatethat the retrieved event data should not be cached; this feature may beused, for example, to inhibit caching when a large quantity of eventdata is read from persistent storage for purposes of off-line datamining.

As indicated by the foregoing, the illustrated embodiment of the eventhistory server 32 captures data descriptive of browsing events as suchevents occur, and makes such event data available to personalizationapplications 38 in real time (i.e., substantially immediately). Inaddition, the event data is made available in a form that allowsapplications 38 to limit their queries to the specific types and itemsof event data needed to perform specific personalization tasks. Further,unlike systems that rely on the results of an off-line data mininganalysis, the applications 38 have access to the “raw” event dataitself, as opposed to merely a summary of such data. Users of the eventhistory server may, in some embodiments, be given the option to controlwhether their respective browsing histories are to be recorded by theevent history server (e.g., an opt-in or opt-out option may beprovided). Additional features and benefits of the disclosedarchitecture are discussed below.

II. Event Object Content and Retrieval

As indicated above, the set of data stored for a given event is storedby the event history server 32 as an event object. In one embodiment,each event object includes the following components: Subject, Value,Tag, and Time. Each of these components is described below. In oneembodiment, these components are used to capture data regarding threegeneral types of events: mouse clicks, impressions, and mouse-overevents.

The Subject of the event object is a code that indicates, for mouseclick, mouse over, and impression events, the type of display elementinvolved (e.g., an item, a browse node, an external URL, or link forsubmitting a search query). One or more subject codes may also bedefined for describing search query submissions from users. The Subjectof an event, together with the event's Tag (described below), fullyspecify the type of the event (e.g., mouse click of browse node,impression of catalog item, etc.).

Table 1 below provides examples of some of the event subjects that maybe supported, and indicates the data stored in the Value field for eachsuch event subject. These examples assume that the web site system 30hosts an electronic catalog that may be browsed and searched by users tolocate items (products, new articles, etc.), and also assumes that theweb site system 30 implements a search engine for locating external websites and pages. As will be recognized, the types of events recordedwithin a particular web site system 30 will depend largely on the natureand purpose of that system, and may vary significantly from those listedin Table 1.

Further, the event types may be varied or extended in order to supportadditional application features. For example, in one embodiment which isnot represented in Table 1, an “annotation” event type is defined forpurposes of storing annotations entered by users. One application forannotation events involves allowing users to annotate their respectivesearch results, and to later recall and review such annotations. Usersmay also be given the ability to publish their annotations to otherusers. Unlike mouse clicks, impressions, and mouse over events,annotation events represent explicit requests by users to store eventdata for later retrieval.

TABLE 1 Event Subject Description Value Catalog User selection,impression, or mouse-over of a Item ID item catalog item Browse Userselection, impression, or mouse-over of a Browse node ID node browsenode (item category) Internal Search query submission for conducting anText of search term(s) Search internal search, such as a search foritems in an entered by user query electronic catalog Web search Searchquery submission for conducting a Text of search term(s) query generalWeb/Internet search that is not limited entered by user in scope to anyparticular set of Web sites Web search URL selected by user from websearch results Two separate strings are URL page. This information maybe captured, for stored, one which contains example, by initiallydirecting the user's the text of the URL, and the browser to an internalURL used for tracking other of which contains the purposes, and thenredirecting the browser to page's display title. Strings the externalURL/web site selected by the user. that exceed a particular length aretruncated. Feature User access to a particular web site feature, FeatureID such as an item recommendations service Other URL URL access eventthat does not fall within one Text of URL and associated of the abovecategories display title

The Tag may be implemented as a set of flags indicating some or all ofthe following: (1) the general type of the event (e.g., mouse click,impression, or mouse-over), (2) whether the Value portion of the eventhas been truncated (e.g., because of the excessive length of aparticular text string), (3) whether the event is “undisplayable,”meaning that it cannot be viewed by the user, and (4) whether the eventis transient versus persistent.

A Tag's “undisplayable” flag may be used, for example, to allow users toeffectively remove events from their viewable event histories. Forexample, the web site system 30 may provide an application 38 andassociated user interface through which users can view and search theirrespective event histories, and “delete” selected events from suchhistories. When a user deletes a particular event (such as particularsearch query submission or browse node access), the corresponding eventobject is marked by the event history server 32 as “undisplayable” toprevent the user from viewing the associated event, but remainsaccessible to clients of the event history server 32.

The “transient/persistent” flag may be used to mark those events thatcan be permanently deleted from persistent storage at a certain point oftime. This feature may be used to purge event objects that are of littleor no value after a certain time period, so that the associatedpersistent storage is made available for storing other data.

The Time component is a numerical value indicating the time ofoccurrence of the event, and may be expressed, for example, in secondssince 1970. When the persistent/transient flag is set to “transient,” anadditional value may be included specifying the time of expiration ofthe event object. Expired event objects may be deleted from persistentstorage periodically by a background task, or using any otherappropriate method.

The query set implemented by the event history server 32 preferablyallows clients to retrieve the event objects for a given user or sessionbased on event Subject, Value, Tag, and Time. For example, a client canrequest the event objects for all impressions (or all mouse clickevents) of a particular type of display element, or for all impressions(or all mouse click events) of a particular display element type andvalue. In addition, the query set preferably allows clients to specifyan event time range (e.g., “last 10 days,” or “since Feb. 10, of 2003”).

As mentioned above, the query set also supports queries of the followingform: “does an event of type T and value V exist within history of userX?” The type of the event may be specified in the query in terms of thegeneral event type (e.g., mouse click or impression), the type ofdisplay element involved, or both. For example, a query of the form“does event of type=Web search query and Value=comet Halley exist inhistory of user X?” would reveal whether user X has conducted a generalWeb search using the query “comet Halley.” Further, the query setsupports queries of the type “when did event of type T and value V occurin the history of user X?”

III. Software Architecture

FIG. 2 illustrates the primary software components that run on the cachelayer and storage layer servers 42, 46 in one embodiment. Asillustrated, each storage layer server 46 runs service code 60 forprocessing queries and updates from the cache layer 40. The service codecommunicates with database software 62, such as software used toimplement Berkeley databases. The database software 62 manages one ormore databases of event data in disk drive storage 64. In oneembodiment, each storage layer server 46 runs multiple pub-sub typeservice processes, each of which corresponds to a respective set orrange of user IDs and exclusively accesses its own Berkeley database;this architecture greatly simplifies the management of the databases.Each storage layer server 46 additionally runs synchronization servicecode 68 that is responsible for maintaining the storage layer servers insynchronization. The synchronization service is used, for example, tosynchronize the persistent databases after one storage layer server istemporarily taken off line.

Each cache layer server 42 runs cache layer service code 70 thataccesses its respective cache 43 of event data. Event data is preferablystored in the cache 43 both by user ID (if the user is recognized) andsession ID. As illustrated in FIG. 2, the cache layer service code 70includes components 72, 74 for processing updates and queries receivedfrom clients.

As further depicted in FIG. 2, both types of servers 42, 46 alsopreferably include software components for generating and processingBloom filters. As described below, each Bloom filter is a condensedrepresentation of some portion of a user's event data, and may be usedto evaluate whether the user has performed a particular action withouthaving to retrieve the associated event objects. For example, in oneembodiment, Bloom filters are used to reduce the processing and dataretrieval needed to determine whether a given user has accessed a givenURL.

IV. Example Search Personalization Applications

As indicated above, one application of the event history server 32involves generating a personalized search results page identifying anysearch result items that were previously accessed by the particularuser. The search results page may further indicate the time each suchitem was accessed. This feature may be applied both to internal catalogsearches (in which case the search results page may, for example,indicate those catalog items for which the user has viewed an itemdetail page), and to web searches (in which case the search results pagemay indicate which of the external web pages has been viewed).

FIG. 3 illustrates an example web search results page according to oneembodiment of this feature. In this example, the web search query “lordof the rings” has produced a search results listing in which each searchresult item (three shown) is an external web page or site having acorresponding URL. In one embodiment, the links contained within thesearch result listings are internal links (i.e., they point back to theweb server system 30) that are used to immediately redirect the user'sbrowser to the external page of interest; this allows the event historyserver 32 to record event objects describing the search results(external URLs) selected by the user for viewing. In another embodiment,the user's selections of external URLs are logged by firing a JavaScriptcomponent that detects and reports the user's selections.

The first item 100 in the list of FIG. 3 includes the annotation“viewed,” indicating that the user previously accessed this particularURL. The user can thus readily identify those web pages that havealready been viewed. Because the user's event history is stored on theserver side (rather than merely on the user's computer 36), the “viewed”status may be properly indicated even if the user only viewed theparticular web page from a different computer 36. By hovering the mousecursor over this item 100, the user can also view the date he/she lastaccessed this URL, as shown by the mouse-over text “previously viewed onFebruary 14.” The date-of-access text may alternatively be displayedin-line with the search results text 100. Various other types ofinformation can be incorporated into the annotations based on the user'sevent history, such as the number of times the user has accessed theparticular URL, and the text of a prior search query that uncovered thisURL (if different from the current search query).

FIG. 4 illustrates the general process by which a search application 38may interact with the event history server 32 to generate search resultspages of the type shown in FIG. 3. It is assumed in this example thatthe user is recognized by the web site system 30. As depicted by block110, the search application initially receives a search query, andexecutes the search to generate a set of search results. If the searchis a general web search, each search result item will be in the form ofa URL of a web page that is responsive to the search, and may bedisplayed together with textual excerpts extracted from such web pages(as in FIG. 3). As is known in the art, the execution of such searchestypically involves comparing the search query to an index generated by aweb crawler program. If the search is directed to an electronic cataloghosted by the web site system 30, each search result item may be in theform of an item description that may be clicked on to access acorresponding item detail page. In some embodiments, the search resultsreturned in a given search may include both external web pages and itemsselected from an internally-hosted electronic catalog.

As depicted by block 112, the search application then sends a separatequery to the event history server 32 for each search result item—or atleast those that are to be displayed on the current search resultpage—to determine whether the user previously accessed that item. Forgeneral web searches, each such query may be in the form of “has user Xselected URL=<URL value> before?” For catalog searches, each query maybe in the form of “has user X selected catalog item=<item ID> before?”As described below, the event history server 32 preferably uses Bloomfilters to efficiently process these types of queries, although the useof Bloom filters may alternatively be omitted.

As depicted by block 114, for each search result item found to have beenpreviously accessed by the user (if any), an additional query ispreferably sent to the event history server 32 to request the date(time) of the last user's last access to that item. Finally, as depictedin block 116, a search results page is generated with embeddedviewed-item annotations of the type shown in FIG. 3. The searchapplication may also take the viewed/not-viewed status of each searchresult item into consideration in ranking/ordering the search resultitems for display.

As will be apparent, the process depicted by FIG. 4 can be varied suchthat the event history server 32 is not actually accessed in response tothe search queries. For example, to reduce the load on the event historyserver 32, the event data indicative of specific search result URLsselected by specific users can be periodically retrieved from the eventhistory server 32 and stored on a separate “search personalization”server. This separate server can then be assigned to the task ofresponding to requests of the type “has user X accessed external URL Y?”The same is true for the process of FIG. 5, described below.

The search application's user interface may also provide an option forthe user to restrict the scope of the search to items previously viewed,items not previously viewed, or items viewed within a particular timeperiod (e.g., the last seven days). This feature may be implementedusing the same process flow as in FIG. 4, except that the query resultsreturned by the event history server 32 will be used to determine whichsearch result items are to be displayed.

As will be apparent, the foregoing search results personalizationfeatures, as well as those described below with reference to FIG. 5, mayalso be applied to searches for other types of items. For example, inthe context of an online auction site, the results of an auction searchmay be annotated to indicate which of the located auctions have beenviewed by the user, when each auction was viewed, and possibly whetherthe user has submitted a bid on each such auction. Similarly, theresults of a search of bulletin board postings or blog (web log)postings may be annotated to indicate if/when specific postings wereviewed by the user.

Referring again to FIG. 3, the second search result item 120 illustratesanother feature that may be implemented using the event history server32. This feature involves determining whether the user previouslysubmitted the same search query, and if so, whether any new searchresult items (i.e., items not present in the prior search results set)have been found. In this particular example, the search results pagereveals that the user previously conducted the same search on February14, and that the item corresponding to the URL “lordotrings.com” is anew item that did not come up in the prior search.

FIG. 5 illustrates the general process by which a search application 38may interact with the event history server 32 to implement this“new-item annotations” feature. This process may be combined with thatof FIG. 4 to generate annotated search results pages of the type shownin FIG. 3. As illustrated in FIG. 5, the search application initiallyreceives and executes the search query (block 130), and queries theevent history server 32 to determine whether the user has previouslysubmitted the same query (block 132). If the user has not submitted thisquery, the search results page is generated (block 140) without addingany “new item” annotations.

If the same search query was previously submitted, the event historyserver 32 is again queried for the time of the last submission (block134). In addition, as depicted in block 136, the event history server 32is queried to determine which of the current search result items, ifany, have not been displayed to the user (i.e., have not been thesubject of an impression event); this query may optionally be limited inscope to impressions occurring at the time of or shortly after the priorsearch. As depicted by block 138, if any new search result items existin the current search result set, they are annotated as shown in FIG. 3(item 120). The relevance rankings of the new items may also beaugmented to increase the likelihood that the new items will bedisplayed.

One variation of the method shown in FIG. 5 is to allow the user toexplicitly limit the (repeat) search to items that did not come up in aprior execution of the same search query. This option may, for example,be provided as a check box on the web site's search page.

IV. Use of Bloom Filters to Determine Whether User Previously Viewed orAccessed a Given Element

As mentioned above, Bloom filters may be used by the event historyserver 32 to reduce the need for persistent data retrieval whenresponding to a query of the form “does event of type T and value Vexist in history of user X?” For example, Bloom filters may be used todetermine whether a particular user has viewed or selected a particularURL.

By way of background, a Bloom filter is a bit sequence or arraygenerated according to a set of hash functions. Bloom filters are usedto quickly test whether a particular item is a member of a large set ofitems. One common application for Bloom filters is to test whether agiven object (as identified by the object's URL) is currently stored ina cache of web pages. Specifically, when an object is added to thecache, the hash functions are applied to the object's URL to determinewhich of the bits in the Bloom filter are to be turned ON. When anobject is requested, these hash functions are again applied to therequested object's URL, and a test is then performed to determinewhether all of the corresponding bits in the Bloom filter are turned ON.If one or more of the bits are not ON, the object is not stored in thecache. If, on the other hand, all of the bits are ON, there is a veryhigh likelihood that the requested object is in the cache. Thus, “falsepositives” or “false hits” are possible, but “false negatives” or “falsemisses” generally are not.

According to one aspect of the invention, one or more Bloom filters aregenerated for a given user to describe some aspect or segment of thatuser's event history. The Bloom filters are preferably generated andstored by the persistent storage layer 44 (FIG. 1), and are passed tothe cache layer 40 for purposes of responding to queries. DifferentBloom filters may be generated for a given user for responding todifferent types of queries. For example, different types of Bloomfilters may be generated for different types of activity (e.g., URLaccesses, URL impressions, URL accesses plus impressions, catalog itemaccesses, mouse over events, etc.). In addition, different Bloom filtersmay be generated for different time periods.

FIG. 6 illustrates the general process by which a Bloom filter may beused to test for URL accesses by a recognized user, User X. Althoughthis diagram is specific to URL accesses, the illustrated process flowalso applies to other types of event activity. In step 1 of FIG. 6, astorage layer server 46 generates one or more Bloom filters for User X.This step may, for example, be performed in response to receiving anassociated query or update for User X. To generate a Bloom filterspecifically for URL accesses, the storage layer service code (FIG. 2)initially retrieves URL access events for User X, applies a set of hashfunctions to each accessed URL, and sets the associated Bloom filterbits. As discussed below, the size of the Bloom filter may be selectedto enable storage of a desired number of events while maintaining afalse-positive rate below a desired threshold. Bloom filters generatedby one storage layer server 46 may be passed to the other storage layerserver(s) by the synchronization service. The bloom filters mayadditionally or alternatively be generated by the cache layer servers42.

In step 2 of FIG. 6, a cache layer server receives a query of the form“has User X accessed URL Y?” In the context of the search resultsannotation features described above, many such queries may be receivedfor a given user for purposes of generating a single search resultspage. In step 3, the cache layer server 42 retrieves the relevant Bloomfilter from one of the storage layer servers 46 if the Bloom filter isnot already stored in the cache 43.

In step 4, the cache layer server 42 tests the relevant Bloom filter tosee if the corresponding bits for the URL are ON. If one or more of thebits are OFF (meaning that no accesses to URL Y exist in the relevantevent history of User X), the cache layer server 42 returns an answer ofNO (step 5), without passing the query to the persistent storage layer.If, on the other hand, all of the bits are ON (meaning that the URLaccess very likely exists within User X's event history), the query ispassed to one of the storage layer servers 46 (step 6) to check User X'sactual event data for the URL access. In step 7, the storage layerserver 46 returns a response to the query via the cache layer server 42.As an alternative to querying the storage layer server (step 6), thecache layer server can be designed to simply return a YES response whenthe Bloom filter test is positive, although this approach may cause inaccurate results to be presented to users on rare occasions.

As updates reflective of URL accesses are thereafter received for UserX, the cache layer server 42 (but preferably not the storage layerserver 46) updates its copy of the associated Bloom filter to reflectthese updates. For example, if User X selects a search result item (URL)from a web search results page of the type shown in FIG. 3, theassociated Bloom filter will be updated within the cache 43 to set thebits associated with the selected URL. Thus, the Bloom filter used bythe cache layer to respond to queries reflects the most recent browsingactivities of the relevant user. At some point, the Bloom filter may bepurged from the cache 43 due to inactivity, at which time the updatedBloom filter is written to the persistent storage layer (in place of theoriginal version) for further use.

At some point, a given Bloom filter may reach its capacity, meaning thatit cannot store additional events without exceeding a desired averagefalse positive rate. At this point, the Bloom filter may be replacedwith a larger Bloom filter (e.g., 8 Kilobytes rather than 4 Kilobytes)in order to provide greater event capacity.

The foregoing description focuses on the generation and use of Bloomfilters for recognized users. Bloom filters may also be generated forunrecognized users by the cache layer servers 42. For example, at theoutset of a browsing session, the assigned cache layer server 42 maygenerate a Bloom filter for a user, and may thereafter update the Bloomfilter with new events for that user. This “session-specific” Bloomfilter may be used to respond to queries in the same way as describedabove.

V. Browser-Based Reporting of Event Data

FIG. 7 illustrates an embodiment in which the event history server 32records event data reported by a browser-based event reporting component160. The event reporting component 160 runs as a component of a webbrowser 162 on some or all of the computers 36 of users of the eventhistory server 32. As illustrated, the event reporting component 160 ispreferably implemented as part of a browser toolbar component 164. Thetoolbar component 164 may, for example, be provided as an optionalbrowser plug-in that can be downloaded and installed by users. The eventreporting component 160 may alternatively be implemented as a standalonebrowser plug-in, as part of a plug-in other than a toolbar, or as partof the native code of the web browser 162.

The browser-based event reporting component 160 preferably reports eventdata for all web sites and pages accessed by the user. For example, theevent reporting component 160 may report every mouse click or otherselection event on every web page accessed by the user. The eventreporting component 160 may also report other types of browsing events,such as mouse-over events, impressions, selections of the “back” buttonon the web browser 162, etc.

The browser-based event reporting component 160 may take the place ofthe server-based event reporting component 35 of FIG. 1. Alternatively,both types of event reporting components 35, 160 may be used within agiven system. For example, for users that do not have the toolbar 164installed, the event history server 32 may only capture data reported bythe server-based event reporting component 35, in which case the eventdata may only reflect events that occur during browsing of one or morespecific web sites. For users that have the toolbar 164 installed ontheir respective computers 36, the captured event data may also extendto other web sites.

The event data collected from the browser-based event reportingcomponent 160 may be used to provide a variety of differentpersonalization services to users. For example, a service may beprovided for allowing users to view a listing of all web sites they haverespectively accessed that satisfy some user-specified criteria. Usingthis service, users may, for example, view listings of all paymenttransactions they have respectively made on the web, or view a historyof all travel-related web sites they have accessed. The data fieldsincluded within the event objects may be supplemented as needed toimplement such a service.

As depicted in FIG. 7, personalized toolbar content may be generated bya server-side toolbar personalization application 170. This application170 may use event data retrieved from the event history server 32 togenerate context-sensitive toolbar messages for display to users. Forexample, when a user accesses a particular web site, the toolbarpersonalization application 170 may retrieve and display a history oftransactions performed by the user on that web site. Further, thetoolbar may be personalized with real time information about all of theweb sites or pages visited during the current browsing session.

VI. Other Personalization Applications

As will be recognized, numerous other types of personalizationapplications and features are made possible by the event history server32. As mentioned above, one such application involves allowing users toview, organize, and possibly annotate their respective event histories.This may be accomplished in part by providing a user interface, such asa set of web pages, through which users can create event historyfolders, and select events to add to such folders. An event searchengine may also be provided through which users can search theirrespective event histories by event type, event value, eventtime-of-occurrence, and various other criteria. As mentioned above,users may also be permitted to “delete” specific events from theirrespective event histories.

Although this invention has been described in terms of certain preferredembodiments and applications, other embodiments and applications thatare apparent to those of ordinary skill in the art, includingembodiments which do not provide all of the features and advantages setforth herein, are also within the scope of this invention. Accordingly,the scope of the present invention is intended to be defined only byreference to the appended claims

1. A system that provides functionality for conducting searches, thesystem comprising: a search engine system that is responsive to searchqueries received over a network from user computing devices of users bygenerating and returning search results pages listing correspondingsearch results, said search results including uniform resource locatorsof responsive resources, said search engine system providingfunctionality, including a user interface, that enables users toannotate particular search results, and to subsequently recall suchannotations for viewing; and a server that persistently stores eventdata in association with particular users of the search engine system,said event data comprising search result annotations created byparticular users, said server including a query interface that enablesthe search engine system to selectively retrieve the persistently storedevent data associated with particular users.
 2. The system of claim 1,wherein the search engine system additionally provides functionality forthe users to publish their respective search result annotations to otherusers.
 3. The system of claim 1, wherein the user interface includesfunctionality for users to view and organize their respective eventhistories.
 4. The system of claim 1, wherein the server is configured tostore a search result annotation submitted by a user as an event objectthat is separately retrievable via the query interface.
 5. The system ofclaim 4, wherein the event object includes an event type identifier andan event timestamp.
 6. The system of claim 1, wherein the server isadditionally configured to store event data descriptive of search resultviewing events, and the search engine system is configured topersonalize search results page for a user with annotations indicativeof when the user previously viewed particular search results, saidannotations based on the event data descriptive of search result viewingevents.
 7. The system of claim 1, wherein the server stores event datafor each of a plurality of defined event types, including an annotationevent type.
 8. The system of claim 7, wherein the query interfaceprovides functionality for selectively retrieving the event data basedon at least user ID, event type, and event time-of-occurrence.
 9. Thesystem of claim 1, wherein the server comprises at least one persistentstorage layer machine that persistently stores the event data, andcomprises a plurality of cache layer machines that cache event data ofparticular users during browsing sessions of such users, wherein thesearch engine system is configured to send queries to the cache layermachines to request event data.
 10. A method, comprising: receiving asearch query submission from a user computing device associated with auser; executing the search query to identify a plurality of searchresults that are responsive to the search query; outputting arepresentation of the search results to the user computing device forpresentation to the user; providing an option for the user to annotatethe search results, and receiving a resulting search result annotationsubmitted by the user; persistently storing the search result annotationin association with the user; subsequently retrieving the persistentlystored search result annotation by execution of a query; and outputtingthe search result annotation for display to the user; said methodperformed in its entirety by a computer-implemented system thatcomprises a plurality of server machines.
 11. The method of claim 10,further comprising, by said computer-implemented system, providing anoption for the user to publish the search result annotation to otherusers.
 12. The method of claim 10, further comprising, by saidcomputer-implemented system, publishing the search query annotation toother users.
 13. The method of claim 10, wherein persistently storingthe search result annotation comprises persistently storing, on aserver, an event object that includes at least the search resultannotation and an event timestamp.
 14. The method of claim 13, whereinretrieving the search result annotation comprises receiving said queryfrom an application, and, in response to the query, returning the eventobject to the application.
 15. The method of claim 10, furthercomprising recording, in association with the user, event data thatidentifies specific search result URLs selected by the user for viewing,and using said event data to personalize search results pages for theuser.
 16. The method of claim 10, wherein the computer-implementedsystem is a web site system that hosts a web site that providesfunctionality for conducting Internet searches.
 17. A system, comprisinga computer system comprising a plurality of servers, said computersystem configured to perform a process that comprises: receiving asearch query submission from a user computing device associated with auser; executing the search query to identify a plurality of searchresults that are responsive to the search query; outputting arepresentation of the search results to the user computing device forpresentation to the user; providing an option for the user to annotatethe search results, and receiving a resulting search result annotationsubmitted by the user; persistently storing the search result annotationin association with the user; subsequently retrieving the persistentlystored search result annotation; and outputting the search resultannotation for display to the user.
 18. The system of claim 17, whereinthe computer system is additionally configured to provide an option forthe user to publish the search result annotation to other users.
 19. Thesystem of claim 17, wherein the computer system is additionallyconfigured to publish the search query annotation to other users.